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1       Introduction 

Practical  planning  problems  with  deterministic  forecasts  of  inherently  uncertain  parameters  often 
yield  unsatisfactory  solutions.  Stochastic  programming  formulations  allow  uncertain  parameters  to 
be  modeled  as  random  variables  with  known  distribution,  but  the  size  of  the  resulting  mathematical 
programs  can  be  formidable.  Dantzig  [5]  and  Beale  [1]  introduced  stochastic  programming  with 
recourse;  some  example  applications  from  the  literature  include  capacity  expansion  planning  [7], 
forest  harvest  planning  [12],  hydroelectric  scheduling  [20,27],  and  portfolio  management  [22,26]. 

Stochastic  programming  algorithms  are  typically  one  of  three  types:  (i)  exact  solution  procedures, 
(ii)  approximation  and  bounding  schemes,  and  (iii)  sampling-based  methods.  Exact  methods  include 
simplex-based  algorithms  that  exploit  special  structure  of  bases  [17],  decomposition  or  L-shaped 
schemes  [2,31],  interior  point  methods  [24],  and  the  Progressive  Hedging  algorithm  [29].  A  classic 
approximation  scheme  involves  calculating  deterministic  lower  and  upper  bounds  via  the  inequalities 
of  Jensen  and  Edmundson-Madansky,  respectively;  see  [3,4,11,21]  for  extensions  and  alternatives. 
Stochastic  quasigradient  (SQG)  methods  [9,13]  are  sampling-based.  Another  set  of  sampling-based 
algorithms  are  rooted  in  the  L-shaped  method  [6,14,19,27].  In  many  models,  as  the  number  of  random 
parameters  and  the  number  of  scenarios  grow,  exact  solution  procedures  and  approximation  and 
bounding  schemes  become  more  difficult  to  apply  due  to  required  computational  effort.  Sampling- 
based  algorithms  may  provide  an  attractive  alternative  for  such  models.  Stopping  criteria  for  SQG 
methods  are  examined  in  [28].  In  general,  however,  sampling-based  approaches  lack  stopping  rules 
that  can  control  a  priori  the  quality  of  the  proposed  solution.  In  this  paper,  we  develop  rules 
designed  to  rectify  this  shortcoming  for  a  particular  class  of  sampling-based  algorithms. 

A  host  of  questions  arise  when  one  replaces  deterministic  upper  and  lower  objective  function 
bounds  generated  by  a  decomposition  algorithm  with  estimates  formed  from  sample  means.  How 
should  an  "optimal"  solution  be  characterized  and  how  should  the  sampling  procedure  proceed  so 
as  to  ensure  an  appropriately  defined  notion  of  convergence?  The  primary  purpose  of  this  paper 
is  to  provide  a  framework  in  which  these  issues  may  be  addressed.  The  stopping  rules  we  develop 
are  comprised  of  two  components:  (i)  a  criterion  for  terminating  the  algorithm  and  (ii)  a  rule  for 
selecting  the  sample  sizes.  The  main  results,  detailed  in  §2,  provide  stopping  rules  that  guarantee 
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asymptotic  validity  of  the  desired  confidence  interval  statements  as  the  interval  width  shrinks  and  the 
sample  sizes  grow.  Application  of  these  methods  to  an  algorithm  for  a  class  of  multistage  stochastic 
linear  programs  [27]  is  described  in  §3.  Empirical  coverage  results  for  a  simple  example  are  given  in 
§4,  and  the  paper  is  summarized  in  §5. 

2       Stopping  Rules 

This  section  begins  by  outlining  our  framework  of  assumptions  on  the  underlying  sampling-based 
algorithm.  Consider  the  following  general  optimization  problem: 

z*  =     minimize   z(x)  ,... 

subject  to  x  €  X . 

Suppose  we  have  at  hand  an  algorithm  that  at  each  iteration  k  selects  a  sample  size  nk,  produces  a 
feasible  decision  Xk,  and  generates  estimates  for  upper  and  lower  bounds  on  the  optimal  objective 
function  value  denoted  Uk(fik)  and  Ljt(njt),  respectively.  It  is  assumed  that  at  each  iteration  k  the 
difference  random  variable,  Dk(nk)  —  Uk(nk)  —  Lk(nk),  satisfies  the  central  limit  theorem  (CLT): 

y/nk(Dk(nk)  -  Hk)  =>  N(0,al)  as  nk  ->  oo,       where  ak  >  0  (2) 

and  where  =>  denotes  convergence  in  distribution;  N(n,o~2)  denotes  a  normal  random  variable  with 
mean  /i  and  variance  cr2.  The  sample  size  parameter  will  typically  be  suppressed  when  referring  to 
the  upper,  lower,  and  difference  random  variables.  The  true  upper  and  lower  bounds  at  iteration 
k  are  denoted  Uk  and  Ik  and  satisfy  Uk  =>  Uk  and  Lk  =>•  Ik  as  n*  — »•  oo,  where  Ik  <  z*  <  u*, 
and  (necessarily)  /ijt  =  Uk  —  h-  If  Uk  and  Lk  are  independent  and  satisfy  respective  CLT's  then 
hypothesis  (2)  follows  as  a  consequence. 

The  algorithm  is  terminated  on  the  first  iteration,  T,  in  which  the  difference  random  variable 
drops  below  zero;  i.e., 

T  =  inf  {k  :  Dk<0}.  (3) 

The  feasible  decision  xt  generated  at  the  random  stopping  iteration  satisfies  ut  >  z{%t)-  In 
addition,  we  assume  that,  at  the  stopping  iteration,  the  algorithm  permits  re-evaluation  of  the 
difference  random  variable  through  independent  resampling.  The  algorithm  is  said  to  stop  correctly 


if  z(xt)  <  z*  +  e  where  e  is  a  positive  confidence  interval  width.  Stopping  correctly  is  ensured  if 
A*x  <  e;  we  use  this  observation  and  the  CLT  hypothesis  (2)  to  prescribe  sample  sizes,  n*,  under 
which  a  statement  can  be  made  regarding  the  probability  that  the  algorithm  stops  correctly.  At  the 
heart  of  the  procedures  we  develop  is  the  idea  that  the  sample  sizes  must  increase  as  the  algorithm 
proceeds. 

In  order  to  illustrate  the  underlying  ideas  in  a  simple  setting,  we  assume  in  §2.1  that  the  difference 
random  variables  at  each  iteration  are  normally  distributed.  Under  this  assumption,  we  show  it 
suffices  to  increase  the  sample  size  at  a  rate  proportional  to  log  k  to  guarantee  that  the  probability 
of  stopping  correctly  satisfies  a  prescribed  confidence  level.  In  §2.2,  the  normality  assumption  is 
replaced  with  the  CLT  hypothesis  (2)  and  an  0(log  k)  sample-size  formula  is  provided  under  which 
the  results  of  §2.1  can  be  recovered  in  the  form  of  an  asymptotic  validity  result.  Moreover,  we  indicate 
that  with  respect  to  required  computational  effort,  the  C?(log  k)  sample-size  formula  is  preferable 
to  the  0(\og  k)  formula.  In  §2.3  we  verify  the  asymptotic  validity  result  under  a  weaker  history- 
dependent  CLT  hypothesis  in  which  the  difference  random  variable  may  depend  on  the  (potentially) 
random  history  of  the  algorithm  in  previous  iterations.  In  §2.4  we  address  issues  associated  with 
finite  stopping  times  and  incorporation  of  sample  variance  estimators. 

2.1      Stopping  Rules:  Normal  Differences 

In  this  subsection  we  replace  the  CLT  hypothesis  (2)  with  the  following  more  restrictive  assumption: 
At  the  kth  iteration  of  the  algorithm,  we  choose  a  sample  size  n*  and  then  observe  the  random 
variable: 

Dk~  N  (nk,<rl/nk)  ,       where      ak  >  0.  (4) 

Example  1  This  example  indicates  that  a  fixed  sample  size,  n*  =  n,  can  lead  to  unattractive 
results.  Suppose  u*  =  Hk  —  I*  >  £  for  k  —  1, . .  .,K;  /ijt  =  0  for  k  >  K  +  1;  and  a*  =  a  >  0  for  all  k. 
The  algorithm  will  not  stop  correctly  if  and  only  if  it  terminates  prior  to  iteration  K  +  1.  By 
choosing  K  sufficiently  large  we  can  make  the  probability  we  stop  correctly,  P{D\  >  0, . . . ,  Dk  > 
0}  =  [P{DX  >  0}]K,  arbitrarily  close  to  0.  ■ 


Theorem  2  provides  a  sample-size  formula  that  overcomes  the  difficulty  illustrated  in  Example  1; 
in  particular,  it  ensures  that  the  probability  the  algorithm  stops  correctly  satisfies  a  minimum 
prescribed  confidence  level  of  1  —  a;  e.g.,  0.95.  We  require  the  following  lemma  regarding  bounds  on 
the  tail  of  a  normal  distribution  (see  Feller  Chapter  VII  §1  [10]). 


Lemma  1  If  z  >  0  then 


P{N(0,l)>z\   -:    -Lie"'2/2. 
V2w  z 


Theorem  2  Assume  (3),  (4),  and  define 

nk>  (^)V+2plnfc),  (5) 

where  e  >  0,  0  =  max  {2  In  [((p)/(  V27  a)]  ,  l},  0  <  a  <  1,  and  C(p)  =  T,?=ik~P>  P  >  l-    Then 


Proof 

Dt  <  0  implies 


P{»T  >  * }     <  P{HT  >DT  +  e} 

oo 


t=i 


<X>{Djb</i*-e}. 


t=i 


Now  X^tLi[a/C(p)]  &   p  =  a;  thus,  it  suffices  to  show 


a 


P{Dk<^k-e}<-—k-r 
C(p) 


To  this  end  consider 


P{Dk<fik-e}     =P\—,  p:<-€y/nj;/<Tk\ 

=  P{JV(0,l)>cVnTM} 

<P{yV(0,l)>(/?  +  2pln/b)1/2} 
Since  j3  >  1,  the  tail  bound  from  Lemma  1  yields: 


P{Dk  <  nk-  e)  <  —==  exp 
v2tt 


--(/?  + 2p In  fc) 


Hence  it  suffices  to  show 

^=  e_ x„ ( fl  4- 1n\r\  k\     < 

C(p) 


-I(/?  +  2pln*) 


<^-k~r 


and  this  inequality  follows  from  the  definition  of  (3.  ■ 

Remark  1  The  coverage  result  of  Theorem  2  states  that  [0,  c]  is  a  (1  —  a)  ■  100%  confidence  interval 
for  ht-  In  terms  of  the  optimization  problem  (1)  this  implies: 

PU(xt)  <  2*  +e}  >  1  -  a. 

Remark  2  Tables  of  values  of  the  Riemann-Zeta  function,  C(p)>  mav  be  found  in  Dwight  [8];  C(p) 
is  also  an  intrinsic  function  in  some  mathematical  software  packages  such  as  Mathematical  [32].  We 
return  to  the  topic  of  choosing  values  for  the  parameter  p  to  minimize  the  total  number  of  samples 
required  in  §2.2. 

Remark  3  The  term  "kth  iteration"  should  be  interpreted  liberally;  a  better  phrase  is  "kth  stopping 
cycle"  with  one  possible  definition  as  follows:  A  stopping  cycle  consists  of  a  number  of  algorithm 
iterations  in  which  a  fixed  sample  size  is  used  plus  one  resampling  iteration  in  which  the  sample- 
size  formula  (5)  is  applied.  The  fixed  sample-size  phase  of  a  stopping  cycle  is  terminated  when 
a  heuristic  pre-test  is  passed;  e.g.,  a  negative  difference  is  observed.  The  sample-size  formula 
and  stopping  criterion  described  above  are  then  applied  to  a  realization  of  the  difference  random 
variable  generated  from  an  independent  sample.  This  idea,  illustrated  in  the  application  of  §3,  helps 
to  minimize  computational  burden. 

Remark  4  The  purpose  of  our  analysis  is  to  control  the  quality  of  the  proposed  solution  xt-  Any 
number  of  heuristic  stopping  rules  can  generate  a  feasible  solution,  xt1-  By  resampling  the  asso- 
ciated difference  random  variable,  one  can  form  a  (1  —  6)  ■  100%  confidence  interval  of  the  form 
\z*,  z*  +  Dyi  +  ws<TT'/\/n  ]  for  z(xt')  where  n  denotes  the  resampling  sample  size,  wi,  satisfies 
P{N(0, 1)  <  Wi}  —  1  —  6,  and  Dj.>  =  max{Z>r',0}.  (We  use  Dj.,  because  it  is  known  that  /ix'  >  0.) 
The  disadvantage  of  results  based  on  heuristic  stopping  rules  is  that  we  have  inadequate  control 
of  the  interval  width,  D£>  +  Wj<TT'/\/n-  The  primary  purpose  of  Theorem  2  (and  the  subsequent 
generalizations  we  present  in  this  paper)  is  to  provide  a  prion  control  on  the  confidence  interval 
width. 


Remark  5  We  regard  the  specific  values  of  z*  and  z(xt)  of  secondary  interest  relative  to  controlling 
the  quality  of  the  proposed  solution,  xt-  However,  if  Ut  and  Lt  are  each  normally  distributed  (or 
more  generally  satisfy  respective  CLT's)  then  one  can  also  develop  confidence  intervals  for  these 
quantities  through  the  resampling  procedure. 

Remark  6  The  coverage  result  of  Theorem  2  does  not  depend  on  any  convergence  structure  of 
the  optimization  algorithm.  The  sample-size  formula  is  designed  so  that  if  the  algorithm  does  not 
converge  then  the  stopping  criterion  will  not  be  satisfied  with  probability,  at  least,  1  —  a.  This 
property  contributes  to  the  conservative  nature  of  the  coverage  result  since  many  algorithms  have 
some  underlying  convergence  structure.  We  provide  conditions  under  which  finite  termination  can 
be  ensured  (with  probability  one)  in  §2.4. 

2.2      Stopping  Rules:  CLT  Differences 

In  §2.1,  under  the  assumption  of  normally  distributed  difference  random  variables,  we  derived  a  valid 
confidence  interval  for  all  positive  interval  widths,  e.  In  this  subsection,  we  replace  the  normality 
assumption  with  an  asymptotic  normality  hypothesis  (2)  and  provide  conditions  under  which  the 
confidence  interval  of  §2.1  is  valid  in  the  limit  as  the  interval  width  shrinks  and  the  sample  sizes 
grow.  In  particular,  we  show 

lira  P{z(xT{£))<z*  +  e}>l-a,  (6) 

where  the  stopping  time,  T  =  T(e)  is  once  again  the  first  iteration  in  which  the  observed  difference 
drops  below  zero. 

Example  1  was  appropriate  under  the  assumptions  of  §2.1  only  because  the  confidence  interval 
statement  was  made  for  all  positive  interval  widths;  for  sufficiently  small  e,  adequate  coverage  results 
are  achieved.  In  Example  2  we  again  use  an  identical  sample  size  for  all  iterations  and  construct 
a  problem  in  which  the  probability  of  stopping  correctly  is  zero  in  the  limit  as  the  interval  width 
shrinks  to  zero. 

Example  2  Let  Dk  ~  N(k-1,  n^1)  be  independent  and  define  m  =  [f-1]  — 1.  (The  ceiling  operator, 
[•],  yields  the  smallest  integer  greater  than  or  equal  to  its  argument.)  If  the  algorithm  terminates 


prior  to  iteration  m  +  1  it  has  stopped  incorrectly.  Thus, 

P{»T(<)<t)     <P{D!  >0,...,Dm  >0) 

m 

=  l[P{N(0,l)<k-1y/^} 


i  =  l 


-  ^"1/2 


Let  e  =  Cnk        >  0  so  that  the  sample  size  has  the  same  value  for  all  iterations  and  observe: 

m 

P{HT(e)  <  e}     <  Y[P  {N(0,  V  <  k-l(m  +  1)C} 


m 

<      J]     P{N(0,l)<(m+l)C/\m/2]} 

Jfc=fm/2l 

<[P{AT(0,l)<3C}]rm/21. 
Hence  P{nT(()  <  f}  -»  0  as  e  [  0.  ■ 

Example  2  demonstrates  that  a  fixed  sample  size,  no  matter  how  large,  may  lead  to  "confidence" 
intervals  with  unsatisfactory  coverage  properties.  The  key  idea,  once  again,  is  that  we  must  increase 
the  sample  size  as  the  algorithm  proceeds;  Theorem  3  provides  stopping  rules  under  which  the 
desired  confidence  intervals  are  asymptotically  valid. 

Theorem  3  Assume  (2),  (3),  and 

Dk  -  /it 


sup  E  exp 


7 


is  bounded  for    \j\  <  ^0.  (7) 


Let  0'  =  max  { 2  In  [j>(p)/(V2^  a)]  ,  l},  where  <j)(p)  =  ££i  k~P]nk,  p>  (2^)~1  and  0  <  a  <  1. 


"*>  (y)2  (/?'  +  2pln2fc)  (8) 


then  lim  P  {v-t(i)  <  f }  >  1  —  a. 

c[0  y   ' 


Proof 

-1 


Let  Zk  =  o~k    y/nk(Dk  —  /i*).  We  begin  as  in  the  proof  of  Theorem  2  and  see  that  it  suffices  to  show 

oo 

UmJ^ptZk  <  -((3' -r  2p\n2  k)1/2\  <a.  (9) 


k-\ 


Next  we  show  the  order  of  the  limit  and  sum  on  the  left  hand  side  of  (9)  may  be  exchanged  with 
equality  by  employing  the  dominated  convergence  theorem.  Applying  Markov's  inequality  (see,  e.g., 
Loeve  §9  [23])  to  e_7Z*,  7  >  0  we  have 

P{Zk  <-(/?'  +  2p In2  Jfc)1/2}  <exp[-7(/?'  +  2p In2  k)1/2    E  e~lZk  .  (10) 

With  7  =  70,  the  right  hand  side  of  (10)  is  bounded  above  by  {supfc>1  E  e~loZk  }  k~"1oS^ .  As 
f0yflp  >  1,  the  order  of  the  limit  and  sum  may  be  exchanged  with  equality  and  it  remains  to  show 

00 

J2 lim  P  {Zk  <  -(/?'  +  2pln2  k)1,2\  <  a. 

By  applying  the  CLT  (2)  we  may  complete  the  proof  in  a  fashion  analogous  to  that  of  Theorem  2.  ■ 
Under  a  weaker  hypothesis  on  the  difference  random  variables,  we  have  recovered  the  coverage 
result  of  Theorem  2  in  an  asymptotic  sense.  The  asymptotic  validity  result  of  Theorem  3  may  be 
interpreted  as  follows:  For  sufficiently  small  choices  of  e,  [z* ,  z*  +  e]  is  an  approximate  (1  —  a)  •  100% 
confidence  interval  for  z(xt)- 

We  now  address  two  issues  with  respect  to  the  hypotheses  of  Theorem  3.  First,  we  provide 
conditions  under  which  the  bounded  expectation  assumption  (7)  holds  and  then  we  examine  the 
function  4>(p).  Verification  of  (7)  is  straightforward  when  each  difference  random  variable  may  be 
expressed  as  a  sample  mean  of  independent  and  identically  distributed  random  variables  (i.i.d.r.v.'s); 
in  particular,  it  suffices  to  verify  that  the  underlying  random  variables  have  moment  generating 
functions  for  |7|  <  y0.  Indeed,  this  observation  is  at  the  core  of  an  elementary  proof  of  the  central 
limit  theorem  for  i.i.d.r.v.'s  (see,  e.g.,  Hogg  and  Craig  [16]).  Proposition  4  summarizes  this  result; 
we  use  the  following  notation:  For  each  k,  Dki,  Dk2,  ■  ■  ■  denote  i.i.d.r.v.'s,  Dk  =  ~-  J2l=i  -^**> 
EDki  =  /ijt,  and  E(Dk\  -  Hk)2  =  o\  where  0  <  a\  <  00. 

Proposition  4  //sup  E  exp  7  (  Dkl~lik  j     <  00  then  sup  E  exp  7  (  ok)~Jnk  )     ls  bounded. 

We  now  turn  our  attention  to  <f>(p)  =  YlT=i  k~plnh-  By  comparing  terms  of  the  4>(p)  series  and 
YlT=i  ^~2  f°r  sufficiently  large  k  we  conclude  <f>(p)  <  00  if  p  >  0.  Proposition  5  provides  bounds  on 
<j>(p)  that  facilitate  numerical  function  evaluations  and  further  characterizes  both  <j)(p)  and  £(p)  so 
that  we  may  subsequently  address  the  issue  of  choosing  good  values  for  the  parameter  p. 
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Proposition  5 

(i)  LetvN  =  yflp  [In  N  -  (2p)~1}  and  Kp  =  y/njp  exp  [^p)"1] ,  p  >  0.   Then 

N 

KpP{N(0,l)>vN+1)   <  4>(P)-Y,k~p]nk   <  KPP{N(0,l)>vN) 

k-\ 

and 

Kp  (P  {N(0, 1)  >  vN}  -P{N(0, 1)  >  vN+1})  <  N~"'nN. 


(it)    ln£(p)  and  \n<p(p)  are  convex  functions  on  (l,oo)  and  (0,oo),  respectively. 

Proof 

Based  on  the  inequalities 


we  find 


/■fc+1 

(Jb+l)-pln(i+1)   <     /        u~plnudu  <   k~p]nk 

f°°  ^  /-oo 

/        u-plnudu   <   <j>(p)-J2k-plnk    <  /      u-plnudu 

./7V+1  k_l  JN 


A  change  of  variables  yields 


'      u~p^udu  =   (2p)-1/2exp[(4p)-1]     /      e~u'/2du  =   Kp  P{N(0, 1)  >  vy)  , 

y  Jvy 

where  vy  =  y/2~p  [In  y  —  (2p)-1].   This  establishes  the  first  part  of  (i),  and  the  error  bound  follows 
from 

^OO  *CO 

/      u~p]nudu  -    /        u~p]nudu   <   N~p*nN. 
Jn  Jn+\ 

For  result  (ii)  we  require  the  infinite  series  version  of  Holder's  inequality: 


X>*6*i  ^ 


*=1 


oo 

EM' 

.Jt  =  l 

1/r 

oo 

DM' 

Jb=l 

ll/l 


where  r,  s  >  1  and  1/r  +  1/s  =  1.  For  the  Riemann-Zeta  function  we  must  show 


lnC(Ap1  +  (l-A)p2)   <  AlnC(p1)  +  (l-A)lnC(p2). 


Using  the  definition  of  £(p)  we  see  it  suffices  to  show: 


£(*-")*  (*-»rA  < 


k-l 


oo 

A 

oo 

.k  =  \ 

E*"pa 

Jfc=l 

1-A 


This  inequality  follows  from  Holder's  inequality.  The  proof  for  In  4>{p)  is  virtually  identical.  ■ 

The  0(log2  k)  sample-size  formula  of  Theorem  3  is  driven,  in  large  part,  by  the  moment  gener- 
ating function  hypothesis  (7).  By  making  the  stronger  assumption, 


sup  E  exp 


Dk  -  P-  k  X 


is  bounded  for     I7I  <  y0, 


Vk/y/nk  J 

the  0(\ogk)  sample-size  formula  of  Theorem  2  can  be  recovered;  the  difficulty,  however,  lies  in 
developing  analogs  of  Proposition  4.  The  moment  generating  function  hypothesis  is  attractive  be- 
cause it  is  easily  verified  and  because  with  respect  to  required  computational  effort,  the  O (log  k) 
sample-size  formula  is  preferable  to  the  0(\ogk)  formula  for  practical  problems.  For  the  purpose 
of  verifying  this  statement  we  will  assume  Cfc/e  is  constant  as  would  be  the  case  if  c  is  selected 
proportional  to  cr^  (see  subsequent  Remark  7).  Minimizing  the  total  number  of  samples  or  "work" 
over  T  iterations  then  corresponds  to  minimizing 

Wc(p)  =  T  ■  max  h\n   C(p)/(V27r  a)l  ,  l}  +  2p(^lnfc) 

for  sample-size  formula  (5)  or 

W4>(p)  =  T-  max  {2  In  [^(p)/(v^ra)]  ,  l}  +  2p  (  ^  In2  fc  J 

for  sample-size  formula  (8).  From  Proposition  5,  part  (ii)  it  is  clear  that  W^(p)  and  W<p(p)  are 
convex  functions  on  (l,oo)  and  (0,oo),  respectively.  Table  1  displays  the  results  of  minimizing  the 
respective  work  functions  for  various  choices  of  T  with  a  =  0.05.  While  T,  of  course,  is  unknown  a 
priori,  rough  estimates  for  T  (or  ET)  may  be  available  for  certain  classes  of  problems. 


T 

V 

formula  (5)  0(log  it) 

C(p')        W<(p*) 

formula  (8)  0(log'J  it) 

10 

25 

50 

102 

103 
104 
105 

1.5 
1.35 
1.29 
1.24 

1.15 
1.11 
1.09 

2.612  375 
3.459  237 
4.046  193 
4.761  075 

7.254  695 

9.676  075 

11.694  841 

106 

323 

731 

1  630 

21  715 

269  211 

3  199  021 

0.4 
0.25 
0.19 
0.155 

0.09 

0.065 

0.050 

5.048  588 

9.379  868 

14.865  183 

22.270  678 

94.647  997 
325.046  04 
1  175.999  4 

96 

291 

660 

1  473 

19  720 
246  153 

2  944  556 

Table  1:  Optimal  Choices  of  p 
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For  each  value  of  T,  Table  1  contains  the  minimized  (within  0.1%)  work  function  values,  the 
corresponding  minimizers,  p* ,  and  C(p*)  and  4>(p')  for  reference.  Table  1  indicates  that  the  0(log2  k) 
sample-size  formula  outperforms  the  0(\ogk)  formula  for  values  of  T  dramatically  larger  than  of 
practical  interest  (see  Remark  3).  The  reason  is  as  follows:  Sample-size  formula  (5)  requires  p  >  1, 
and  this  leads  to  a  multiplier  greater  than  2  on  the  2j*=i  'n  ^  term  of  the  work  function  while  the 
corresponding  term  for  W^p)  can  have  any  positive  coefficient.  Theorem  2,  of  course,  could  be 
restated  with  the  0{\og  k)  formula;  the  bounding  series  C(p")  was  initially  selected  because  it  lead 
to  a  better  rate  of  growth.  When  applying  Theorem  3,  one  may  not  be  able  to  use  the  recommended 
values  of  p*  due  to  the  p  >  (2jg)~1  condition.  In  many  practical  problems,  however,  the  underlying 
random  variables  may  be  bounded,  and  under  the  i.i.d.  hypotheses  of  Proposition  4  one  can  then 
choose  any  p  >  0. 

Well-designed  stopping  cycles  (Remark  3)  help  minimize  required  computational  effort  by  reduc- 
ing the  number  of  times  that  the  sample  size  is  increased.  However,  we  wish  to  emphasize  that  even 
under  poorly  designed  stopping  cycle  schemes  the  increase  in  computational  effort  as  the  algorithm 
proceeds  is  relatively  modest.  As  an  example,  consider  e  =  2cjt/v/30  and  a  =  0.05.  The  sample  sizes 
required  by  the  0{\og7  k)  sample-size  formula  with  p  =  0.155  at  the  1",  10'\  100'\  and  1000"1 
stopping  cycles  are  78,  91,  128,  and  189. 

2.3     Stopping  Rules:  History-Dependent  CLT  Differences 

We  can  generalize  the  results  of  Theorem  3  with  respect  to  the  assumptions  on  the  interaction  of 
the  algorithm  with  the  upper  and  lower  bound  estimates.  In  the  development  above,  we  assume 
{/ijt,  cric}^_1  is  simply  a  sequence  of  constants,  but  in  many  applications  these  parameters  may 
depend  on  the  random  history  of  the  algorithm  through  iteration  k  —  1  which  we  denote  7ik-\-  For 
example,  in  an  L-shaped  algorithm  for  two  stage  stochastic  programs  in  which  cuts  are  obtained  by 
sampling  (see  Dantzig  and  Glynn  [6]),  the  sequence  of  master  program  decisions  depends  on  which 
scenarios  were  (randomly)  selected  to  compute  cuts  in  previous  iterations.  In  this  example,  it  is 
clear  /i*,  cr*,  and  Dk  are  random  variables  that  are  sample-path  dependent.  A  realization  of  Tik 
may  be  thought  of  as  the  information  necessary  to  reconstruct  exactly  the  steps  of  the  algorithm 
through  iteration  k.   In  this  more  general  setting,  we  will  assume  that  conditioned  on  the  history 
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random  variable,  the  mean  and  variance  are  constants  and  the  difference  random  variables  satisfy 

(11) 


p\Dk7H<y 


^-hwlj-^du- 


<?k/y/nk 

Theorem  6  Assume  (3),  (7),  (11),  and  define  n*  and  the  corresponding  parameters  as  in  Theo- 
rem 3.   Then  lim  P  {^T(e)  <  e}  >  1  -  a. 

Proof 

Let  Zk  =  <y^1 y/nk(Dk  —  /**).  We  begin  as  in  Theorem  2  and  see  it  suffices  to  show 

oo 

lim]T  p  fzk  <  -(/?'  +  2pln2  k)1/2\  <  a. 


t-i 


From  the  proof  of  Theorem  3  we  know  hypothesis  (7)  permits  exchanging  the  order  of  limit  and 

summation  with  equality.  Now  observe 

oo  . 

J2  lim  /  P  {  Zk  <  -(/?'  +  2pln2  fc)1/2  |  W*_i }  dPHk_, 

00 

=  Y1  /HmP{^<-(/?'  +  2pln2^)1/2|  -Hk-^dPn,., 

oo         - 

=  E  J  p  {^(°> l)  ^  -w + 2Pln2  *)1/2}  dP«*-> 


k=\ 


=  JTp  { N(0, 1)  <  -(/?'  +  2pln2  k)1'2}  . 


k=\ 


The  remainder  of  the  proof  is  analogous  to  that  of  Theorem  2.  I 

Remark  7  The  asymptotic  validity  results  of  Theorems  3  and  6  both  still  hold  when  we  select  e 
proportional  to  crfc;  i.e.,  a  relative  precision  confidence  interval.  In  particular,  with  e  =  io~k  the  only 
technical  modification  required  is  replacing  lim£|o  with  lim£-jo- 

Remark  8  A  sufficient  condition  for  verifying  (7)  in  the  history-dependent  setting  is 


E  exp 


7 


Tik-\ 


<  M  for     |7|  <  70, 


(12) 


Gk/y/nk  ) 

where  M  and  f0  do  not  depend  on  the  iteration  k,  the  history,  Tijt,  or  the  confidence  interval  width, 
e  (which  determines  n^).  This  test  may  often  be  more  natural  to  apply  than  attempting  to  verify 
(7)  directly. 
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2.4      Stopping  Rules:  Additional  Considerations 

In  this  subsection  we  address  two  additional  issues.  First,  we  describe  how  sample  variance  estimates 
may  be  incorporated  in  confidence  interval  construction,  and  second,  we  examine  the  issue  of  finite 
stopping  times. 

2.4.1      Sample  Variance  Estimators 

If  the  confidence  interval  width,  e ,  is  a  sufficiently  small,  pre-selected,  fixed  value  then  we  interpret 
Theorems  3  and  6  as  providing  an  approximate  absolute  precision  confidence  interval.  However,  the 
population  variance  terms,  <r\,  are  typically  unknown;  hence  n*  is  unknown  and  the  procedure  is 
not  implementable.  Alternatively,  iff  is  proportional  to  o~k,  we  obtain  a  relative  precision  confidence 
interval  (see  Remark  7)  with  unknown  width  e<rT.  There  are  standard  approaches  to  this  difficulty 
based  on  well-known  results  from  parameter  estimation  in  statistics.  For  simplicity,  we  describe  one 
possible  approach  in  the  setting  of  Theorem  3  for  the  relative  precision  case.  We  replace  the  CLT 
hypothesis  (2)  with  the  following  assumption: 

Dk~^  =»JY(0,1)  asn^oo,  (13) 

Sk/y/rik 

where  s|  is  a  sample  variance  estimator.  We  may  then  form  a  "sample-variance"  analog  of  Theorem  3 
by  replacing  a\  with  s\  in  (7)  and  (8).  However,  this  result  is  of  limited  value  because  the  sample 
variance  equivalent  of  the  moment  generating  function  hypothesis  (7)  may  be  difficult  to  verify;  if 
Dk  is  a  sample  mean  of  i.i.d.  normal  random  variables  then  s^  y/nk(Dk  —  Hk)  is  a  Student's  T 
random  variable  and  does  not  have  a  moment  generating  function.  (The  situation  for  an  absolute 
precision  confidence  interval  is  at  best  unimproved.) 

A  simple  solution  to  this  difficulty  is  as  follows:  At  iteration  T  we  can  resample  the  difference 
random  variable  at  the  proposed  solution  xj  (see  Remark  4).  We  denote  this  random  variable  Dt 
and  assume  it  satisfies  the  CLT  (2)  with  mean  ht  and  variance  aT.  Coupled  with  a  weakly  consistent 
sample  variance  estimator,  sT,  this  ensures  the  sample  variance  version  of  the  CLT  hypothesis  (13) 
holds  for  Dt-  From  this  we  infer 


*,z*  +  D+  + 


W&ST 


T    '     V*  J 
13 


(14) 


is  an  approximate  100(1  —  6)%  confidence  interval  for  z(xt)  where  n  denotes  the  (resampling)  sample 
size  and  w&  satisfies  P{N(0, 1)  <«?«}  =  1  —  6.  As  Theorems  3  and  6  ensure  ht  is  not  too  large, 
we  have  a  priori  control  on  the  confidence  interval  width.  Restated:  For  sufficiently  small  e  and  for 
sufficiently  large  n  =  n(rj)  we  have 


<  DZ  H -=-  >  car  +  rj  >  «  a    for  any     r)  >  0. 

I  y/n  J 


(15) 


2.4.2      Finite  Stopping  Times 


It  would  be  undesirable  if  the  stopping  rules  we  have  developed  precluded  finite  termination  for 
"well-behaved"  algorithms.  To  this  end,  we  introduce  the  notion  of  a  stopping  tolerance  e'  satisfying 
0  <  e'  <  e  and  make  the  following  assumptions.  The  stopping  time  is  redefined  as 

T=  inf  {it  :  Dk  <  e'}  .  (16) 

Under  this  termination  criterion  P{T  >  m)  =  P{D\  >  e' ,  D^  >  e', . . . ,  Dm  >  e'},  and  we  assume 

/m 
\[P{Dk>e'\'Hk-l}dPnm_l  (17) 

jt=i 

which  is  a  natural  generalization  of  inter-iteration  independence  of  the  difference  random  variables 
to  the  history-dependent  setting.  With  regard  to  the  convergence  structure  of  the  algorithm  we 
assume  there  exists  a  subsequence  of  {//jJ^Lj  that  converges  to  zero  with  probability  one;  i.e., 

P  {u>  :  ftoo(u>)  implies  3  {fikj(uj)}°°=  1  such  that  fikj(u)  —  0}  =  1.  (18) 

In  this  framework  the  following  modification  of  Theorem  6  incorporates  a  finite  stopping  result. 

Theorem  7  Assume  (7),  (11),  (16),  e'  >  0  and  define 

nk>(-^\(3'  +  2pln2k) 

and  the  corresponding  parameters  as  in  Theorem  3. 
Then 

lim  P  {/iT(e)  5:  €}  •>  1  —  <*•      (asymptotic  validity) 
If,  in  addition,  (IS),  (17),  (18)  hold  and  0  <  e'  <  c  then 

P{T(e)  <  oo}  =  1.  (finite  stopping  time) 
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(19) 


Proof 

By  hypothesis  (17) 


P{T<  00}  >  1-  Jf[p{Dk  >J\Hk-i}dPHn_x, 


We  know  from  the  bounded  convergence  theorem  it  suffices  to  show 


lim  f[P{Dk>e'\Hk-i}  =  0  w.p.l. 


m— »oo 

Jt=l 


For  any  e'  >  0,  condition  (12)  and  n^/a^  — ►  00  as  k  — ►  00  ensures  there  exists  K  such  that  k  >  K 
implies  E  {\Dk  — /ijt|  \7ik-\}  <  f'/4-  Applying  (a  conditional  version  of)  Markov's  inequality  we 
find 

P{\Dk-lit\>  c'/2  I «*-i}  <  E{\Dk  -  /itl  I  W*-i}  /(e'/2). 

Define  /Cm  =  {fc  :   k  <  m,  k  >  K,  Hk  <  c'/2},  and  observe 

m 

IIP{I>*>«/|W*-i}<    I]   ^{P*-M*l>e72|Wfc-i}<(l/2)|JCm|. 

The  finite  stopping  result  follows  as  (18)  implies  \/Cm\  — ►  00  as  m  — *  00  w.p.l. 

The  proof  of  the  asymptotic  validity  result  is  virtually  identical  to  that  of  Theorem  6.  ■ 

3      Application 

A  T-stage  stochastic  linear  program  with  recourse  (SLP-T)  may  be  formulated  as  follows: 

r 
minimize        2_\    /.  P?'0?'1?' 

subject  to    -BEljxSl0  +  Afx?  =  6J\    *?'  >  0,    w(e  Q, 
for  *  =  1,...,T 

where  B%1  =  0. 

A  sample  point  (scenario)  in  the  stage  t  sample  space,  Qt,  is  denoted  u>t.  A  stage  t  >  2  scenario, 
u>j,  has  a  unique  stage  t  —  1  ancestor  denoted  a(wt),  and  a  stage  t  <  T  scenario  has  a  set  of  stage 
t  +  1  descendant  scenarios  denoted  A(ujt).  A  stage  t  realization,  £f  =  vec(cf  ,bf' ,  B^llt  Af),  is 
to  be  read  column-wise  as  a  vector  in  3tNt ,  where  Nt  =  nt  +  mt  +  mt  ■  nt-\  +  mt  ■  nt\  Af  is  an 
mt  x  nt  matrix  and  the  remaining  matrices  and  vectors  are  dimensioned  to  conform.   We  assume 
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£t  has  finite  support  and  a  probability  mass  function  given  by  P  {£t  =  $?*}  =  vV  ■  For  notational 
convenience,  we  assume  a  first  stage  sample  space,  Q,\  that  is  a  singleton  set  where  £%l  represents 
the  known  state  at  the  time  decisions  are  made  in  the  first  stage;  clearly,  p"1  has  value  one.  At 
the  time  decisions  are  made  in  stage  t,  the  observation  £|"'  and  the  previous  stage's  decision  x°_i 
are  known  to  the  decision  maker;  the  goal  is  to  find  a  first  stage  decision,  x\,  that  minimizes  the 
expected  cost  of  operating  the  system  over  T  stages. 

Pereira  and  Pinto  [27]  have  proposed  a  sampling  and  decomposition-based  algorithm  for  SLP-T 
models  with  interstage  independence  of  the  stochastic  parameters  and  a  "manageable"  number  of 
descendants,  |A(u>£)|,  for  each  node  in  the  scenario  tree.  Note  that  if  T  is  moderately  large,  appli- 
cation of  exact  methods  or  bounding  and  approximation  schemes  (see  §1)  may  be  computationally 
impractical.  The  basic  idea  behind  the  algorithm  is  to  compute  upper  bound  estimates  by  sampling 
paths  through  the  scenario  tree  on  a  forward  pass,  and  to  compute  deterministic  outer-linearization 
cuts  on  a  backward  pass.  Figures  1  and  2  are  designed  to  illustrate  this  concept. 


Figure  1:  Forward  pass  (left  to  right) 


Figure  2:  Backward  pass  (right  to  left) 


A  sample  path,  u>  =  (uii,U2, . . .  ,uj-),  specifies  exactly  one  node  per  stage  in  the  scenario  tree 
and  has  the  property  that  the  nodes  identify  a  path  from  the  stage  1  node  to  a  stage  T  leaf  node. 
Nonanticipativity  constraints  are  satisfied  on  a  forward  pass  along  a  sample  path:  The  first  stage 
problem  is  solved  with  only  previously  generated  cut  information  regarding  the  future.  The  first 
stage  decision  is  then  passed  to  the  right-hand-side  of  a  randomly  selected  second  stage  subproblem, 
sub(u>2),  and  the  process  is  then  repeated  from  stage  2  to  3.   A  forward  pass  along  a  sample  path 


16 


simulates  operation  of  the  system  given  the  cuts  collected  in  previous  iterations.  A  new  cut  is  then 
appended  to  the  set  of  cuts  at  each  stage  on  a  backward  pass  along  each  sample  path.  Under  the 
assumption  that  the  stochastic  parameters  are  interstage  independent,  the  cuts  computed  for  any 
stage  t  subproblem  are  valid  for  all  other  subproblems  on  that  stage.  The  algorithm  for  SLP-T  is 
summarized  in  Figure  3. 


step  0     let  A:  =  0; 

append  lower  bounding  cuts   8t  >  —Mt,  t  =  1,. ..  ,T—  1; 

step  1     solve  the  stage  1  master  program  and  obtain  (xk,0k)\ 
let  2*  =cxx\  +0\\ 

step  2     select  a  set  of  random  sample  paths  Sk  according  to  p^T ; 
do  w  E  Sk 

do  t  =  2  to  T 

form  RHS  of  sub(wt):  fi^i^S'V  +  hV\ 
solve  and  obtain  [z"']*; 
enddo 
enddo 
let  Jk  =  c,x\  +  ^  £we5*  Ef=2  C[<']fc; 

step  3     if  Jk  —  z_k  <  0  then  stop:  x\  is  the  proposed  solution; 

step  4     do  t  =  T  —  1  downto  1 
dow€5l 

do  w(+i  6  A(w() 

form  RHS  of  sub(wt+1):  B?,+1[x?Wi+l)]k  +  tig?; 
solve  and  obtain  dual  variables; 
enddo 

use  dual  variables  to  compute  cut; 
append  the  set  of  stage  t  cuts  with   0t  —  GfXt  >  9t\ 
enddo 
enddo 

step  5     let  k  =  k  +  1;  goto  step  1; 

Figure  3:  Decomposition  Algorithm  for  SLP-T 
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The  algorithm  of  Figure  3  generates  valid  lower  bounds  (i.e.,  they  contain  no  error  due  to 
sampling)  and  in  the  history-dependent  sense,  sample  mean  upper  bounds  on  the  optimal  objective 
function  value;  applying  Theorem  6  in  this  setting  is  straightforward.  The  first  stage  decision  at  the 
k  iteration,  £*,  is  a  random  variable;  it  depends  on  the  set  of  first  stage  cuts,  and  these  cuts,  in 
turn,  depend  on  which  sample  paths  were  selected  in  previous  iterations.  However,  given  the  history 
of  the  algorithm  through  the  first  k  —  1  iterations,  Wi_i,  the  decision  x\  is  known,  and  the  upper 
bound  estimate  lk  is  the  sum  of  a  deterministic  constant,  c\x\,  and  a  sample  mean  of  i.i.d.r.v.'s 
with  realizations  of  the  form  [z^Y  =  5Zt=2  c^'  [xt^']fc  •  Thus,  the  upper  bound  estimate  satisfies  a 
history-dependent  CLT  with  mean  Uk  =  c\x\  +  Ez\  and  variance  cr„  =  E  (zk  —  Ezk)  .  The  lower 
bound,  z_k  =  Ik,  is  deterministic  when  conditioned  on  Tik-i-  Thus  the  difference  random  variable 
Dk  =  z~k  —  z_k  satisfies  the  history-dependent  CLT  hypothesis  (11)  of  Theorem  6  where  p.k  =  Uk  —  Ik 


and  Gy.  =  u\k.   The  random  vectors  ft  =  vec(ct,  bt,  Bt-\,  At),  t  —  2,...,T  are  bounded  and  if 


we 


assume,  for  example,  that  the  subproblems  at  each  stage  have  bounded  primal  feasible  regions  then 
the  random  variable  zk  is  bounded.  Thus  the  conditional  moment  generating  function  hypothesis 
(12)  is  satisfied  (see  Proposition  4),  and  hypothesis  (7)  then  follows. 

We  now  describe  the  recommended  algorithm  for  SLP-T  based  on  the  stopping  rule  theory 
of  §2.  The  modified  algorithm  utilizes  the  idea  of  stopping  cycles  (see  Remark  3)  which  help  to 
minimize  computational  burden.  The  existing  steps  of  the  algorithm  in  Figure  3  are  modified 
as  follows.  To  step  0  we  append  "define  c  >  0;  let  v  =  0,  n„  >  0,  p  >  0,  0  <  a  <  1,  and 
let  (3'  =  max  {21n  [<j)(p)/(y/2n  a)]  ,  l}."  See  Section  2.2  for  recommended  values  of  p  and  the 
corresponding  values  of  <f>(p).  In  step  2  we  select  the  random  sample  Sk  to  be  of  size  nv.  In  step 
3,  if  the  heuristic  pre-test  is  satisfied  then  a  stopping  cycle  is  complete  and  we  go  to  step  6.  The 
additional  steps  are  detailed  in  Figure  4.  (Note  that  we  have  redefined  e  to  be  the  i  of  Remark  7  so 
that  we  are  employing  a  relative  precision  confidence  interval.) 

The  sample-size  formula  (8)  of  Theorem  6  is  satisfied  by  the  modified  algorithm,  and  hence  the 
asymptotic  validity  result  holds.  Note  P{p,r  <  c<?t}  >  1  —  a  implies  P{h  <  z(xt)  <  h  +  £<tt}  > 
1  —  a;  moreover,  in  this  setting,  /j  is  a  known  lower  bound  when  the  algorithm  terminates  (see 
Remark  5).    While  <tt  is  unknown,  as  a  practical  matter  one  may  be  satisfied  that  the  solution 
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step  6     let  v  =  v  +  1;  and  nu  =  \(~2  (/?'  +  2pln2  u)) 

step  7     independently  select  a  set  of  new  sample  paths  Sk  according  to  p^T  of  size  n„; 

dou  esk 

do  t  -  2  to  T 

form  RHS  of  sub(wt):  flj'ijx^0]*  +  V?*; 
solve  and  obtain  [x^']fc; 
enddo 
enddo 
let  Jk  =  Clx\  +  ^  Ew65-  Ef=2  CK'l'i 


step  8     if  ~zk  —  z*  <  0  then  stop:  x\  is  the  proposed  solution;  otherwise  goto  step  4 

Figure  4:  Additional  "Stopping  Rule"  Steps  to  the  Algorithm  for  SLP-T 

xt  is  of  sufficiently  high  quality  based  on  sample  variance  estimates  observed  during  the  course  of 
the  algorithm.  Alternatively,  the  resampling  procedure  described  in  Section  2.4  may  be  applied  to 
obtain  a  confidence  interval  of  the  form  (14);  we  can  also  replace  z*  by  the  known  lower  bound  lj 
in  (14). 

4      Empirical  Coverage  Results 

In  this  section  we  present  preliminary  empirical  coverage  results  that  illustrate  dangers  associated 
with  naive  stopping  rules  and  show  empirical  performance  of  the  recommended  stopping  rules  devel- 
oped in  §2.  The  simplistic  "test  problems"  are  motivated  by  Example  1,  and  we  use  a  pseudo-random 
number  generator  [30]  to  directly  form  the  difference  random  variables. 

In  the  first  group  of  these  problems,  the  true  gap  is  fit  =  2/3  for  k  =  1, . . . ,  K  and  fxt  =  0  for 
k  >  K  +  1,  and  the  pre-specified  confidence  interval  width  is  e  =  1/3.  We  terminate  on  the  first 
iteration  in  which  the  difference  random  variable  drops  below  zero;  if  the  algorithm  stops  prior  to 
iteration  K  + 1  it  will  have  stopped  incorrectly.  The  difference  random  variables  are  sample  means  of 
i.i.d.r.v.'s  of  the  form  U(  —  y/Z,  \/3)  +  2/3,  where  U(a,b)  is  a  uniform  random  variable  on  the  interval 
(a,  6).  The  choice  of  (a,  6)  =  (— >/3,  \/3)  yields  or*  =  1.  Table  2  depicts  the  empirical  coverage 
results  for  two  separate  stopping  rules  on  four  test  problems  with  different  values  of  K .   The  first 
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stopping  rule  ignores  the  sequential  nature  of  the  problem  and  uses  the  fixed  sample  size  that  would 
generate  an  asymptotically  valid  95%  one-sided  confidence  interval  of  width  1/3  for  the  true  gap,  if 
the  algorithm  consisted  of  only  one  iteration.  We  choose  a  =  0.05  which  yields  zQ  =  1.645  and  thus 
n  —  (zaCric/e)  =  25.  In  the  second  stopping  rule,  we  increase  the  sample  size  at  each  iteration 
according  to  sample-size  formula  (8).  The  parameter  p  is  selected  from  Table  1  for  each  value  of  K . 
Table  2  is  based  on  1000  replications  and  shows  that  ignoring  the  sequential  nature  of  the  problem 
does  not  lead  to  undercoverage  results  in  this  setting  until  K  grows  large  and  formula  (8)  leads  to 
100%  coverage  in  each  case. 


K 

n  =  25 

nk  =  0(\og'k) 

coverage 

10 

0.997 

1.000 

100 

0.963 

1.000 

1000 

0.692 

1.000 

10000 

0.029 

1.000 

Table  2:  Coverage  Results:  e  =  1/3  and  n  =  2/3 

Next  we  consider  the  case  in  which  m  =  e  =  1/3  for  k  =  1, . . . ,  K.  Thus  the  only  modification 
to  the  previous  group  of  test  problems  is  that  the  difference  random  variables  are  sample  means  of 
i.i.d.r.v.'s  of  the  form  U{— \/3,  a/3)  +  1/3.  This  is  a  more  demanding  test  of  the  stopping  rule  theory 
because  the  true  gap  and  the  confidence  width  are  identical.  The  coverage  results  are  summarized 
in  Table  3.  The  coverage  results  obtained  via  sample-size  formula  (8)  significantly  exceed  the  95% 
confidence  level  in  each  case.  As  one  might  expect,  the  naive  stopping  rule  has  very  poor  coverage 
results  in  this  setting. 


K 

n  =  25 

nk  =  0(\og*k) 

coverage 

10 

0.613 

0.994 

100 

0.007 

0.995 

1000 

0.000 

0.993 

10000 

0.000 

0.989 

Table  3:  Coverage  Results:  e  —  1/3  and  /j.  =  1/3 
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5      Summary 

In  this  paper  we  developed  a  stopping  rule  theory  for  a  class  of  optimization  algorithms  that  estimate 
upper  and  lower  bounds  on  the  optimal  objective  function  value  via  sampling.  While  our  immediate 
motivation  lies  in  developing  stopping  rules  for  a  class  of  Monte  Carlo  sampling-based  stochastic 
programming  algorithms,  the  theory  may  also  be  applicable  to  other  optimization  algorithms  that  use 
simulation  techniques.  In  the  main  result,  we  assume  that  the  difference  rardom  variables  satisfy 
history-dependent  central  limit  theorems  and  provide  appropriate  conditions  and  a  sample-size 
formula  under  which  the  desired  confidence  interval  for  the  objective  function  value  of  the  proposed 
solution  is  asymptotically  valid.  We  regard  the  recommended  procedure  as  conservative  because: 
(i)  underlying  convergence  properties  of  the  optimization  algorithm  are  ignored  in  developing  the 
methodology;  and  (ii)  the  normal  tail  bound  used  to  derive  the  sample-size  formula  is  not  sharp, 
particularly  in  the  early  iterations.  Moreover,  through  well-designed  stopping  cycles  and  because  of 
the  slow  growth  of  the  sample  size  formula,  the  recommended  stopping  rules  are  practical  from  an 
implementation  standpoint. 

The  applicability  of  the  stopping  rule  theory  was  illustrated  on  an  algorithm  for  a  class  of  mul- 
tistage stochastic  linear  programs  (Pereira  and  Pinto  [27])  that  generate  sample  mean  upper  bound 
estimates  and  deterministically  valid  lower  bounds.  In  other  sampling-based  stochastic  program- 
ming algorithms  for  two  stage  programs  (Dantzig  and  Glynn  [6],  Infanger  [19],  Higle  and  Sen  [14]), 
sample  mean  upper  bounds  are  readily  available,  but  lower  bound  estimates  have  proved  more  dif- 
ficult to  analyze.  Development  and  analysis  of  lower  bound  estimators  for  these  sampling-based 
algorithms  remains  an  active  area  of  research  [15,18,25];  the  procedures  we  have  developed  here 
should  be  useful  for  asymptotically  normal  estimators. 
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